Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding first stab at a first draft of a proposal for campaign finance… #61

Merged
merged 17 commits into from
Feb 3, 2017

Conversation

aepton
Copy link
Contributor

@aepton aepton commented Dec 7, 2016

… filing models

Just wanted to make sure I was on something like the right track before I filled in more details.

@fgregg
Copy link
Contributor

fgregg commented Dec 7, 2016

Thanks for starting on this! paging @boblannon @evz @palewire, @gordonje

When I look at a filing document, I want to know two things

  1. the information contained within this filing
  2. the business logic that determines what information this filing should include. For example, contributions over $250 should be itemized and include name, address, date of contribution.

Knowing the business rules is critical to interpreting the information in a filing, but I think those business rules should be represented separately.

Been thinking about a FilingType model, that a Filing object has a relation to. These business rules change over time, while the name of type of filing sometimes does not, so it would be important to be able to represent the business rules as being connected to certain time spans.

@aepton
Copy link
Contributor Author

aepton commented Dec 7, 2016

Do we even want to represent the business rules at all? That seems like a lot of extraneous work, in particular since, as you said, the rules change all the time.

I was thinking that we want to have a really loose/barebones model for a filing itself, and then some FilingType that describes that filing, but not encode the rules at all. The rules would be implicit based on the data contained in the filing - this Filing, of type ILContributionReport, contains Contributions. That's what we're getting from the Regulator, so that's kind of all that matters - if the rules say "this has to include all contribs over $250", we can't enforce those rules or even (in many cases) know if they're being violated, and keeping up to date with all the legislative changes would be quite a pain.

@fgregg
Copy link
Contributor

fgregg commented Dec 7, 2016

@aepton I think that you are right that a Filing object need not be dependent on the existence of FilingType object.

That said, it still might be useful to think about these as distinct models (even if don't get around to implementing FilintTypes), because it might help us avoid putting business rules into Filing objects

@palewire
Copy link

I'm thrilled to see this ball rolling. @aepton, when you're ready for comments on your early submission please let me know.

@aepton
Copy link
Contributor Author

aepton commented Dec 12, 2016

Ok, I think this captures where I'd like to start the conversation. Please have at it with any and all types of suggestions/tweaks/fixes/jaw-droppingly-obvious omissions/subtle whatevers/I should probably just end this sentence, you get it.

Copy link
Contributor

@fgregg fgregg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've mainly noted things that do not need to be part of this proposal because they already exist in OCD.

**optional**
Date (and possibly time) when filing period of coverage ends.

filing_regulator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably should be from_organization to keep consistency with existing model https://github.com/opencivicdata/docs.opencivicdata.org/blob/master/proposals/0006.rst#implementation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SGTM

filing_committee
Committee

filing_date
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

submissions by committees and publication by regulators and amendment should be actions, like those on bills.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so should it use the OCD model directly; import and extend it here; or should there be a FilingActivity type that covers submission and amendments?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just suggesting you use same pattern as bills.



actions

    A list of objects representing individual actions that take place on a bill, comprising the legislative history of the proposal in question. Actions consist of the following properties:

    organization, organization_id
        The organization that this action took place within.
    description
        Description of the action.
    date
        The date the action occurred in YYYY-MM-DD format. (can be partial by omitting -MM-DD or -DD component).
    classification
        A list of classifications for this actions, suggested values would be things like 'passage', 'introduction', etc.
    related_entities

        A list of all related entities (such as legislators mentioned by name in the action). Each entity has the following fields:

        name
            The upstream-given name of this related entity.
        entity_type
            'organization' or 'person' - the type of entity that is related
        organization, organization_id
            If the entity_type is 'organization' and the entity is resolved, will be the organization that is related.
        person, person_id
            If the entity_type is 'person' and the entity is resolved, will be the person that is related.

filing_regulator
Regulator

filing_url
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this indicate?

Copy link

@palewire palewire Dec 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some cases, like California, there are PDF or HTML documents online that can be linked via the database record's unique identifier.

For instance, the most recent filing by our governor's campaign committee contains a set of contributions made to ballot measure committees this past election day.

In the database, the filing's unique identifier is 2106282, which can be combined with a common URL pattern to return the PDF version of a paper filing. (A small side note: The filing's amendment identifier is a crucial second component needed to guarantee uniqueness for all California records)

http://cal-access.ss.ca.gov/PDFGen/pdfgen.prg?filingid=2106282&amendid=0

screenshot from 2016-12-12 09-14-02

Including this link in the OCD schema may not be mandatory, but from a practical point of view I can vouch for the fact that reporters and data journalists are constantly referring back to PDF records like these when analyzing campaign finance to verify and further scrutinize their findings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so it looks like you want this to indicate a source This is what we have for bills

sources

    List of sources used in assembling this object. Has the following properties:

    url
        URL of the resource.
    note
        optional Description of what this source was used for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, cool. Adopting this here.

filing_relevant_election_date
Date of (nearest? next?) relevant election.

filing_person
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give an example of this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think generally, if not exclusively, this will be the date of the upcoming election, but I guess sometimes filings could be made relative to just-concluded elections.

So like, a declaration of candidacy for a specific upcoming election; then a contestation of results after the election has concluded (but clearly still referring to that specific election).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant "filing person" not "filing_relevant_election_date"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, like the treasurer or whoever signs off on a given campaign disclosure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay. Give some examples of who this type of person could be in the proposal.

**optional**
Person responsible for the filing.

Committee
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything here can be done with a normal OCD organization with possible exception of purpose and candidates (but see my notes about candidates).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's fine. What's the model for repurposing existing OCD types? We need a Committee model, so whatever's the best way to get Officers, Purpose and Status into an OCD Organization I'm fine with.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to do anything special for

  • officers (already handled in OCD Post model).
  • status, OCD org model has start and end dates. is this sufficient
  • purpose... this one is tricky. Let's create a section in this PEP for miscellaneous questions and stick that one in there for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think OCD start/end dates quite cover what we need here, I'll add this to the questions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls feel free ping me on the PEP for "purpose". Toronto committees tend to have a short "focus" that the city likes to use, and so I'd be interested to track :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what might be the options for "status"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status is primarily "active"/"inactive" but I think in some states, some active Committees still have to file to announce whether they're contesting anything in a particular election. That seems like a status.

Copy link

@palewire palewire Dec 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we can infer the current status in 100% of cases, but the "active vs. inactive" distinction definitely exists in the California data, where committees can file "termination reports" that put themselves out of business.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like popolo's "date of founding", "date of dissolution" is sufficient here. If there committee has to file notice of intent of contestation, that seems like it should be handled by a filing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a bit more nuance here, which you got at with the filings that indicate notice of contestation - there are a series of windows that apply to the status of a committee. Updating PR to reflect this.

name
Name of the Committee

candidate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the right place for it. The can be a many to many relation committees and candidates. I "Candidate Support" should maybe be a separate model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a good point. Candidate Support/Opposition, really, so I guess Candidate Orientation. I'll add this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A good example from California would be ballot measure committees which can form to support one-to-many propositions, and can evolve over subsequent elections to support a variety of measures over time.

on a given election day. I suppose it's possible some Candidates won't have
Regulators (God help us all).

Jurisdiction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool.

**repeated**
Government Level with...jurisdiction over this Jurisdiction.

Office
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool.

name
Name of the Party.

Regulator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can just be an OCD organization.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just Party, or both Regulator and Party?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both.

**optional**
If this is a primary, each Party involved in this Election.

GovernmentLevel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between a GovernmentLevel and Jurisidiction?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of GLs as not specific instances of governmental bodies but as the levels themselves - federal, state, municipal, county, tribal, etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When's an example of a case where you know the jurisdiction but you still want to know the "goverment level"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Guess I can't think of one. Removing it.

@fgregg
Copy link
Contributor

fgregg commented Dec 12, 2016

@aepton thanks for this great start. I have three general comments at this point.

  1. I think that many of the models that you describe could already be covered by existing OCD models, I've noted the relevant models as in-line comments.

  2. I think the Election and Candidate models should be pulled out into separate PR. It would be great to get some of the openelex folks to look at that, separately.

  3. There seems to be two philosophies of how to represent contributions and expenditures. Personally, I think that we should treat the contributions that are represented a filing as claims not facts. I think that we should represent the data as entered with minimal inferences about the identity or veracity of those claims.

Basically, this comes down to representing a filing as denormalized row, versus as the relation between modelled entities. I would strongly prefer the denormalized representation.

I think that we can have attach an optional ocd-person-id and *ocd-organziation-id' to the denormalized representations to make downstream processing much easier.

@aepton
Copy link
Contributor Author

aepton commented Dec 12, 2016

Thanks, Forrest, this is really awesome and helpful. Updating this PR now and I'll pull out Election and Candidate into a separate proposal. I'm with you on modeling contributions and expenditures - as a data utility, we want to do nothing beyond providing what other folks are claiming. Then as journalists or whomever, we can use this data to model things and make assertions and inferences - and I want to make the latter as easy as possible without compromising the design of the former.

Implementation
==============

Filing
Copy link

@palewire palewire Dec 12, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading the Filing schema right, I see there are four date fields currently listed:

field definition
filing_date Date (and possibly time) when filing was submitted.
filing_coverage_begin_date Date (and possibly time) when filing period of coverage begins.
filing_coverage_end_date Date (and possibly time) when filing period of coverage ends.
filing_relevant_election_date Date of (nearest? next?) relevant election.

Here are some thoughts I have on those:

  • So far they perfectly match the four date fields we currently have drafted on our early attempts at cleaning up California's filings, though we've given them slightly different names: date_filed, from_date, thru_date and election_date. I like your longer ones better for being more specific, but I wonder if there is a common date naming convention for other OCD schema we should be emulating. Is there?

  • Pedantic: There might some cases where the state systems record a slight variation between when a filing is "submitted" by filer and when it is "received" by the government. Do we need to worry about that distinction?

  • More serious: While I expect the "from" and "thru" fields will be common to most periodic campaign disclosures (like quarterly committee filings), I can imagine some other common campaign disclosures like late contribution reports, statements of intention (to run for office) and statements of committee organization or termination that do not have them. Is the aim of this schema to serve as a subclass for these other campaign-related forms, or only for periodic financial disclosures?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To your second question, I think whatever system is ingesting these reports and preparing them in this format should be responsible for deciding what corresponds the best to "filing submitted". If this system is handling claims political entities are making about the world, then "time they submitted their claims to a regulator" seems like the most useful approximation of "submission time".

To your final question, I think this schema should be able to handle both types, and any other coverage window information should be optional (as are the filing_coverage begin and end dates). A (perhaps) separate (and much harder) question is how to reconcile multiple reports that describe the same contribution/expenditure/event.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • If we treat submission, reception, and publishing by regulator as actions that resolves many difficulties. Adding first stab at a first draft of a proposal for campaign finance… #61 (comment)

  • I'm a little wary about putting nearest election date as an object on the filing, since that's typically not something that appears in the actual filing documents I've seen. I think that we can definitely make convenient queries thought to surface the same content.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if this description would be extensible. For US federal data, there are two main characteristics that determine what rules the filer has to abide by. For example, you need the committee type and committee designation to determine if a committee is a Super PAC. In addition to this, you can also see the org type to find organizations like labor unions, corporations or individuals.
Here is the FEC API's basic schema for committees, if that is helpful reference. We include all filers and not just those that are technically a committee.

Sometimes it makes sense to have election date if that is something you want to represent from a line on a form, like the Form 3, but there are a lot of situations where this wouldn't apply. Especially in the primary season, on filing can apply to more than one election that will be on multiple days. Additionally, the election dates change, so if you add a day to each filing you will have to amend the election day on past filings or know that the same election can be represented as different days. We generally track the election type like primary, general, special or runoff election and the office that the election is for. You can then cross reference that election with a date from the date endpoint.

I think that the election is generally more useful on the transaction level. That is where you can see if a donation is attributed to the primary or general etc., which is important for keeping track of donation limits per donor, since those are per-election. Again knowing the election type and office is better than having the date.
For reference here is the FEC API's basic schema for filings, though I would like to note, we would like to move toward a schema that separates the summary financial information from the filings information.

There are also some practical examples of how other smart people approch US federal data with the fetch project

I'm not nearly as knowledgeable about state or international campaigns, so feel free to ignore anything I say that isn't as widely applicable. (Also I am just commenting in my personal capacity.)

Copy link

@palewire palewire Dec 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, a quarterly filing by a candidate committee in the state of California will include the "date of election." If you look closely, or click here on a 2014 filing by our governor's reelection campaign, you can see it in a center top of the first page.

screenshot from 2016-12-15 16-49-12

As @LindsayYoung points out, this is not structured data. It is only a date string, which of course could be fallible. @gordonje is in the process of vetting the California data right now to get a grip on the links between elections, filers and committees, but I don't think we have a comprehensive answer on how reliable that information has been at the filing level.

Copy link

@palewire palewire Dec 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LindsayYoung Would you mind expanding a little more on why you think summary information should be separated from its filing? Potentially including that here was one thing @gordonje and I had pondered, but we lack your depth of experience wrestling with these issues. So I'm curious to hear your thoughts.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is what Lindsay had in mind, but definitely agree about the utility of breaking out summary information with federal data for analysis. For example, presidential committees file on F3P; candidates on F3, pacs on F3X. Some filers report fundraising and spending, while others only report spending. But most folks don't really want financial details for only one of these forms, but for all of them. So it's really useful to have a standardized form of common elements (because there are so many that can't be standardized, the federal summary forms are really wide). Whether that sorta thing is within the scope of this doc, or how it fits into OCD is kinda over my head.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so it sounds like we should have a repeated, optional field indicating which election(s) this filing applies to, and since we're also introducing the notion of an Election object in a separate PR, this seems straightforward.

@LindsayYoung does that capture what you were looking for in terms of flexibility? I'd absolutely love to be able to use this to model federal elections/data.

@jsfenfen I think what you're describing is what I have in mind for how this system would work for an end user - we do the work of translating federal/state/local filings into this standard scheme, and each state/federal/local parser is responsible for doing The Right Thing for that jurisdiction, such that it's easy for people to do cross-jurisdictional comparisons.

For an end user, there should be some interface to this data, and since the jurisdictional parsers have done the heavy lifting of saying that "F3P gets processed this way, and F3X gets processed that way" then the end-user-interface system can make the comparisons users have in mind in a straightforward way.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the election as an optional, separate object sounds good.

@palewire The summary information can get pretty long and varies from filing to filing. The time frame for summary information can also vary depending on the type of filer. The totals for the coverage period are pretty straightforward, but there are also cumulative totals and those by calendar year for PACs and parties on form 3x but longer for candidate committees to match up with their respective election cycles on form form 3 and form 3p. Also, the financial information isn't applicable to all the forms and that can confuse people. We have seen that cause confusion that outweighs the convenience of having those financials there.

**optional**
Person responsible for the filing.

Committee

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would different committee types be handled?

Here is a short list of the different types of committees from California I can pull from the top of my head:

  • Candidate committees that support the election of a candidate, which take in money to run a focused campaign for a particular office and can be remade for different election cycles. An example would be Brown for Governor 2014.
  • Independent recipient committees that take in money from supporters and disburse it to candidate committees and independent political spending. They can exist for decades. These are sometimes known as "PACs" in federal parlance and can include corporate committees like Exxon Mobile, unions like Sheet Metal Workers Local 206 and what in some venues might be called "Super PACs" or "independent expenditure committees."
  • Ballot measure committees that can support or oppose the passage of one to many propositions in one to many elections over time. An example would be Yes on 62, No on 66. Replace the costly, failed death penalty system.
  • Candidate-controlled ballot measure committees or leadership PACs that allow candidates to raise money for their favored causes besides their own election. An example would be Brown's Ballot Measure Committee.
  • State and local political party committees that raise money on behalf of political parties and move money into key and favored races as well as supporting general activities for the party. An example would the California Republican Party.

California has a couple other ones as well, like Slate Mailer Committees and Small-Contributor Committees, but I'm not sure how common those are in other jurisdictions. All of the above I expect to be common across the country.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had been thinking they'd be imputed based on the orientation(s) a Committee takes toward one or more Candidate(s). But this is a good question and one I've added to the list. It seems not obvious how best to handle this, since every jurisdiction will have different types.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to keep in mind is that a committee might be connected with a ballot measure rather than a candidate. We should probably consider a schema for those as well to go along with the Election objects.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost forgot my favorite committee type: The legal defense fund!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do need a committee type because different types of committees have very different rules on who they can contribute to, how, and when they the need to disclose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Popolo has a classification attribute but within OCD these have been typically been used for pretty high level classes, like legislature, executive, community board Maybe we should have an type in addition to a classification

Do you have thoughts on this question, @jpmckinney ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there states with hybrid super pacs yet (aka carey committees)--two separate accounts, but only one committee? Not sure how many of these matter (esp. at the state level) but do you care about multi-candidate fundraising committees? Inaugural committees? Campaign cost committees? Convention committees? Dedicated accounts, like building, legal?

Copy link

@palewire palewire Dec 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm reading the FEC committee spec provided by @LindsayYoung properly, I believe the issue you raise @jsfenfen is addressed there by first classifying a committee with its "committee type" (e.g. Independent expenditure, Candidate) and then recording details about its relationship to candidates via its "designation" (e.g. Belonging to candidate, Authorized by candidate, Joint fundraising, etc.). Maybe that's something we should do here as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what @aepton is planning on doing with CandidateOrientation I think that's the right idea, but the name isn't quite right yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sounds like the name for CandidateOrientation should be CandidateDesignation, but otherwise the basic idea seems like it handles these use cases.


memo
String (may simply need repeated "notes" fields for items of this type).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for both Contribution and Expenditure it would be cleaner to come up with something that mirrors the concept of what the FEC calls a "transaction_type," which describes the kind of thing it is - in-kind, transfer to an affiliated committee, refund, etc. Otherwise we face having to define the various is_{some type} fields, which seems more complicated.

Something else to consider: at the federal level, at least, there is a distinction between "receipt" and "contribution", with the latter being an intentional donation. All contributions are receipts, but there are receipts (offsets, investment income) that are not contributions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly agree that the two receipt is the better abstraction over contribution. Also like the 'type' idea.

Copy link

@palewire palewire Dec 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are several other transaction types not captured here in the California data, like loans, debts and "miscellaneous" transfers between committees. That last one is a great place to find millions from Soros, George.

There is also, as I know @dwillis knows well, the key distinction commonly made between monetary and non-monetary contributions. Which gets at a secondary level of classification that may exist with any of the "receipt" types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a really good change; updating.

memo
String (may simply need repeated "notes" fields for items of this type).

Amendment (Section)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the proper verbiage to use here, but I'm not sure that an amendment that fully replaces a previous version of a filing can be rightfully called a "section" of that same filing.

Copy link
Contributor

@gordonje gordonje Dec 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a suggestion:

  • filings have something like a "version_count", indicating how many different versions of the filing are known to exist (in CA these count up from zero, not sure if that works for everywhere).
  • each section has a "filing_version" attribute, indicating which on version of the filing all the truth claims contained in the section were made.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Section" may not be the right abstraction for an amendment. I like @gordonje's notion of versions, though I don't know if we need version counts - we primarily care about whatever the current version is. Secondarily, for a given claim we want to be able to say which version(s) it comes from.

Maybe we should think of amendments as a linked list, allowing you to get back to previous versions and superseding versions of the same filing from wherever you are.

Should amendments (in our system) contain all the previous data from prior versions of a filing, or just a diff? I'm inclined toward the former, but could be talked out of it - seems like it's conceptually more straightforward and would be easier to use, primarily coming at the cost of (cheap and ever-cheaper) storage.

I'll add a question about how to handle amendments, and leave a stub for them in the current PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aepton I also definitely prefer having amendments contain all previous data from prior versions instead of just the diff.

I was thinking "version_count", or whatever you would call it, would facilitate a simple sanity check about each contribution, expenditure or other claim, like they are not being attributed to a version of the filing that isn't known to exist. But that might be overkill for a lot of people.

------

id
Open Civic Data-style id in the format ``ocd-cf-filing/{{uuid}}``
Copy link

@palewire palewire Dec 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In California, the unique identifier of a filing is a combination of its "filing id" and its "amendment id." The first version of a filing has 0 as its amendment id. That number increments up one with each new version while all versions share the same filing id.

How would that sort of system be standardized to OCD in this schema? Would we combine those two numbers into this field to create a composite id?

Copy link
Contributor

@gordonje gordonje Dec 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of the proposal (and I would really like @aepton to check me on this) is that this id would be what CAL-ACCESS calls the filing_id, sans amend_id. Then each amendment would be a different section to the filing, with the amend_id serving as that section's id.

First of all, I am reading this right?

If so, I'm confused about how/if amendment sections relate to contribution sections and expenditure sections. I think I get how this proposal would have us represent the fact that a given filing was amended and how many times it was amended, but I don't quite understand how this proposal would have us represent, for example, all of the contributions that were included on the first version of a filing separate from all the contributions that were included on the second version of a filing.

In practice for us in California, the contributions on the second version of the filing are mostly duplicates of the contributions found on first version. There's even a transaction id that is unique within the different versions of a given filing. The typical differences included modifications to amount or the contributor's name/info Maybe the second version has additional contributions. There are surely a least few cases where a contribution is removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, I think this system is agnostic as to how the filing IDs are generated - it just assumes them to be unique. I think it's fine for each state/municipality/whatever to be responsible for creating its own filing_ids for precisely the reason @palewire outlined - namely, each jurisdiction has a different, logically-consistent system (or at least, many do). As long as the IDs are unique (maybe give each jurisdiction a namespace) it doesn't matter how they're generated.

@gordonje to your first question, I envisioned each version of a filing as a separate Filing object, each with an Amendment section indicating it was overwriting the previous Filing. Each Filing would have all the Contributions and Expenditures associated with that version.

I added Amendments to the Questions section because I think how this should work is still fairly unclear to me. For instance, presumably each Contribution has a transaction ID (at least in some states). So with each version of a Filing, the same ContributionID will be present on a Contribution, and that Contribution's details might change, so how do we version the Contribution object without creating a really cumbersome system?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OCD has typically strongly preferred OCD ids that are not human readable (with the notable exception of ocd division ids). But, it is important to preserve the source identifiers and that has typically been done with an identifier attribute. Notice this distinction between the id and identifier in voteevents

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not know about this distinction between id and identifer. Is there any system by which the uuid for the id field are chosen or generated?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not specified in any OCD spec (AFAIK). The defacto reference implementation of OCD right now is pupa, and it uses uuid.uuid1()

https://github.com/opencivicdata/pupa/blob/274cd0ddf9d550c5b20cd99d6d19720041d21222/pupa/scrape/base.py#L151

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, @aepton I totally agree amendments are complicated... Federal senate paper filings only contain lines that overwrite the originals (there's a lotta good reasons to not think about paper filings, but I bet this kinda partial overwriting thing isn't as standard as I want it to be). Also, there's sometimes inconsistency in how filing-level amendment is reported. Is it A amends B amends C, or A amends B, and then C subsequently amends B? One can fallback on whatever is reported, but it's potentially messier than one might hope.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fgregg ok, cool - added an original_id optional field to Filing (and a few other objects) to capture the state-generated ID in cases where we care about it, and otherwise I'm happy to assume the IDs of every object are not human-readable, beyond being namespaced.

@jsfenfen My current proposal is to have a field called invalidates_prior_versions that allows us to have amendment actions on filings which either wipe out everything previously disclosed, or don't. I'm thinking amendments should be handled using a combination of that field (if we even need it) and lists of all transactions, etc contained in that filing.

So if you have filing A with transactions B, C and D; and then you have amendment E with transactions B and D; then you've got a lot of duplication but you can also just look at the most recent filing to see all the current versions of the currently-disclosed transactions.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like @jsfenfen alluded to, amendments are messy, and to do useful things like make totals and comparisons, it is more complicated than is ideal. It is often helpful to separate that out making it easier to have a wide array of summary information and makes it easier to build out additional filters to help guide people toward not making double counting mistakes.

The FEC API is rolling out some improvements to our filing schema and there are a few things that are pretty useful and might be helpful concepts here. We are adding a latest_filing_id, and a most_recent boolean, which is a useful short cut to make sure you are looking at the right filing. We are also adding an array for the amendment chain of a filing. It is straightforward for electronic filers, and we are adding logic infer it for paper filings.

Also, I would love to hear any specific suggestions any of you have to improve FEC efiling schemas, API schemas or even forms!

**optional**
Date (and possibly time) when filing period of coverage ends.

from_organization
Copy link
Contributor

@gordonje gordonje Dec 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was initially confused by "from_organization" as a name for this. It sounded like the place where we represent the originator of the filing. I recognize that it's consistent with how OCD models the chamber from which a bill originates, but I wonder if the analog is really that strong.

Is there a reason we can't call it "regulator" or regulatory_organization"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's above my pay grade :) I agree "regulator" or something like that would be more intuitive, but I'm sensitive to the need to integrate this into the larger OCD world too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see it both ways. Let's go with regulator.

Open Civic Data-style id in the format ``ocd-cf-amendment/{{uuid}}``

filing_to_amend
Filing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would this section require a Filing attribute while the others would not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's an oversight; I should have removed it during a refactor. Nice catch :)

@gordonje
Copy link
Contributor

gordonje commented Dec 14, 2016

I've added a few of my own comments (apologies for taking my sweet time!). Overall, really like the direction in which we are headed.

Also wanted to touch on FilingTypes: Are we imagining this as a means of modeling what we at CCDC call the [Filing Forms(http://calaccess.californiacivicdata.org/documentation/calaccess-forms/)? These include:

If so, I wonder if "FilingForm" or "FilingFormat" might be a name that more specifically describes this object but is still general enough to cover all cases. Don't mean to quibble too much about names, but the business rules we are talking about (and their changes) are often most clearly described in reference to these forms. To that point: the instructions for completing and submitting the filings often are communicated directly on the forms. At least that has been my experience in CA.

Maybe there are other examples of FilingTypes I'm not thinking about. The raw CAL-ACCESS data also has a concept of "Statement Type", which is meant represent a real mish-mash of categorizations, including

  • Quarterly Statements
  • Semi-annual Statements
  • Pre-election Statements
  • Supplemental Pre-election Statements
  • Special Odd-Year Campaign Statements
  • Termination Statements

But maybe a lot of this stuff is accounted for elsewhere, though maybe not as directly as we would like. For example, one can infer the length of the filing period for the filing_coverage_begin and end_date attributes.

…ove away from specific transaction objects to a general model
@aepton
Copy link
Contributor Author

aepton commented Dec 14, 2016

@gordonje yeah, I think the Filing Type object is meant to represent what you describe. I think Type is a better name than Form because many of these "forms" aren't really paper forms anymore; a lot is disclosed electronically. I don't want to tie us too closely to the notion of specific forms in particular jurisdictions; it's definitely meaningful to talk about what's in a specific campaign's report, or a last-minute filing report, or a quarterly report, or something like that, so it's worth modeling those in the DB. But they're essentially just bundles of claims responding to a given rule/requirement, so I prefer Type to Form.

@aepton
Copy link
Contributor Author

aepton commented Dec 14, 2016

Curious what the next step should be - I can't merge this PR, but beyond that, should I start trying to implement a version of this spec, or move on to the campaign entities thing in PR 62? I'm new both to meaningful contributions to open source projects in general, and certainly to how y'all want to move forward on this particular project.

@gordonje
Copy link
Contributor

@aepton yeah, come to think of it, the specific filing forms are probably way further into the weeds than most folks care to be (maybe I just want someone to come find me!). Especially, if you're doing analysis across states/jurisdictions. Categories like "quarterly filing" and "semi-annual filing" are plenty meaningful, and the forms are more like a means of satisfying legal requirements that say like "you have to submit this specific information every quarter" or whatever.

@fgregg
Copy link
Contributor

fgregg commented Dec 21, 2016

@aepton I think there's a number of questions to be resolved, I think we are a point where progress will be furthered by an attempted implementation.

@palewire
Copy link

@fgregg We are currently working through the process of refining our raw data into humanized models at the django-calaccess-processed-data repository.

Is there are a particular piece we should try to implement as first pass?

We're currently coming in at the problem around the edges and are closest to an "Election" model like has been discussed in #62.

@fgregg
Copy link
Contributor

fgregg commented Dec 21, 2016

@palewire

The current reference implementations for OCD models live at https://github.com/opencivicdata/python-opencivicdata-django/tree/master/opencivicdata/models

It would be great to do a couple of things as you work on the calaccess data.

  1. Attempt to use the existing models in that repo for Organizations (including committees), Posts (which are what we typically call offices), and People.
  2. If you want to work on Election stuff next, take over the work that I just barely started on some models here: https://github.com/datamade/docs.opencivicdata.org/blob/elections/proposals/drafts/elections.rst

@jungshadow
Copy link

If you want to work on Election stuff next, take over the work that I just barely started on some models here: https://github.com/datamade/docs.opencivicdata.org/blob/elections/proposals/drafts/elections.rst

👍 Really like that you stuck close to the @votinginfoproject specification on that proposal. You may already know this, but, in turn, @votinginfoproject is collaborating with NIST and their public working groups. Hopefully, all these different-but-related lines of work stay in sync.

@aepton
Copy link
Contributor Author

aepton commented Dec 21, 2016

I'm happy to start work on a reference implementation of this proposal for Washington state. I have some work to do on my platform before I'm ready to start, but I should be able to get on it soon. Does anything else need to happen for this PR to be merged?

@fgregg
Copy link
Contributor

fgregg commented Dec 21, 2016

I think it's ready to be merged as a proposal, but I don't have the permission bits for that.

attn @jamesturk @jpmckinney

@jpmckinney
Copy link
Member

jpmckinney commented Dec 22, 2016

Note: I haven't read the full thread. Just reading the document and searching through the comments:

Filing

  1. cf-filing: Why not campaign-finance-filing to avoid opaqueness and/or ambiguity?
  2. committee and regulator: Any objection to generalizing to sender and recipient? "committee" is not universal way of referring to the organizations submitting filings. This would also make sense if we later introduce other types of filings.
  3. coverage_begin_date and coverage_end_date: Why not simply valid_from and valid_until start_date and end_date? Anyway, start would be more consistent with other classes than begin.
  4. inciter: This seems like an unusual choice of term. Why not agent?
  5. invalidates_prior_versions: supersedes is more common / appropriate than invalidates.
  6. is_current: Boolean flags tend to be a bad pattern in data schemas. Data should strive to be 'add-only'. From what I can read as the discussion, the logic for this field is somewhat complicated. What are some alternatives to achieving the desired outcomes?
  7. relevant_election: Just make it election - the schema doesn't care about irrelevant elections. Anticipating future filing classes with which this class should have common properties, we can consider making this more generic, like context or legislative_context.
  8. responsible_person: Can someone expand on the semantics of this property? Is it different from 'contact person'?

Committee

This should be a subclass of Popolo's Organization. From what I can tell, only statusis a new property (or should it be statuses since it is an array?). begin_date should be start_date to be consistent with all other classes. For sub-objects like this, note is more common for description.

I'm not sure why committee type is its own object. Perhaps in terms of the code implementation it makes sense to have a code list as an object, but in terms of the schema, a controlled vocabulary can be used for a committee's classification property.

With respect to a committee type's jurisdiction, that actually has to do with a registration that the committee has with a registrar in a particular jurisdiction. So, I would model that as a registration, not as some de-normalized property on a committee type.

Candidate Designation

I don't see any property on other classes that has designations as its range (possible value). How do other classes connect to this class?

Person

Person in Popolo is a real person, so you can't use it for corporations...

Filing Type

See comments about committee types.

Transaction

  • filing_version: Filing doesn't have a version property according to the spec. If we were to add it as a property, then all 'Filing' objects are, in fact, 'filing versions'. So, just filing would be fine for the property name here.
  • original_id: Use identifiers like other classes do for the same semantics as described here
  • type: Any reason not to use classification like other classes?
  • transaction_amount: Don't prefix class names to property names. Just use amount. Also, this should be an object with value and currency properties.
  • is_inkind: This should be added to the new amount object, and abbreviated to in_kind.
  • counterparty: "Person making contribution, or receiving expenditure"... Is this the agent providing or receiving the transaction? Why not a pair of properties sender and recipient?
  • memo: Unless this refers to some specific concept, note is used on all other classes

@jpmckinney
Copy link
Member

@LindsayYoung Where can I see the FEC's schemas?

@jpmckinney
Copy link
Member

Re: new elections models, see my comment popolo-project/popolo-spec#104 (comment) Anyway, let's not have an Elections discussion in this already-long issue! Please create a new issue.

@LindsayYoung
Copy link

Great question @jpmckinney

Here are the API schemas:
https://api.open.fec.gov/swagger/

Click through to the metadata for the other FEC schemas http://www.fec.gov/data/DataCatalog.do

@aepton
Copy link
Contributor Author

aepton commented Dec 22, 2016

@jpmckinney

Filing

  1. Went with campaignfinance there and elsewhere.
  2. I think "committee" is a better abstraction than "sender", but "filer" is better still, imho. No objection to "recipient", though I'm not sure who receives filings besides regulators. Changed.
  3. Switching to coverage_start_date. I don't think "valid" is quite the right word here - the notion here isn't one of validity, but simply the period of time a filing describes. I think "coverage" captures that, and "valid" introduces some ambiguity - if the "valid_end_date" is before the current date, is that filing now invalid?
  4. It was initially "responsible_person"; "inciter" is more general and "agent" is more general still. Changed.
  5. Changed.
  6. This is one area where the thing being captured is inherently hard to pin down. Almost every filing will have an is_current=True set when first filed, and the system is then responsible for keeping that flag up to date. I think that system compartmentalizes the responsibility of determining which set of filings is "current" without adding unnecessary complexity elsewhere, or introducing further ambiguity. We could remove it and leave all such decisions up to each user/dependent system, but I think that would cause more trouble than it would resolve. Alternatively, we could model a set of filings and their currentness apart from the Filing model, but I think that adds more complexity as well without being a better solution.
  7. Changed to "election". I'd like to avoid overgeneralizing from the get-go; this is easy to make more general, should we eventually go down that path, but I think "election" is clear and concise in this context in a way "context" isn't, and certainly "legislative context" isn't.
  8. It's really the same as "agent" and I'm not sure why we kept it separate, now. I'll remove it; the "agent" field in the "actions" should be able to capture this comprehensively.

Committee

Changed to start_date, note and statuses. Added note about making this a subclass of Organization; should we just provide the fields that are different here then?

I think committee_type should be its own object because any given jurisdiction will have several different types that don't necessarily translate cleanly across jurisdictions. And in cases where they do, the rules will nevertheless be different - candidate committees in WA have different rules apply to them than do candidate committees in IL, for instance.

Registration filings should be captured by the Filing object; the jurisdiction filed here is meant to reflect which locality(ies) a committee belongs to, and hence, which laws apply to it (among other things).

Candidate Designation

That was an oversight; added a field for that to Committee.

Person

What should we use here, then? Subclass of Popolo Person for "campaign finance persons" who, thanks to our Supreme Court, may in fact be corporations? This is an ambiguity not easily resolved; most of the time from what I've seen, looking at a given transaction it's impossible to tell if it's a person or a corporation unless you're a human using human heuristics that I'm uncomfortable emulating in this system.

Filing Type

These are useful to model the actual filings committees submit, which have meaning in various contexts, and may help us construct the is_current_filing chain (certain types get superseded by other types, in certain states, at certain times of day, with Venus in the appropriate phase, etc.) And these filings vary titanically from state to state, so I think they're worth modeling as first-class objects.

Transaction

  1. Well, I think this is more specifically saying, "to which action on a Filing does this transaction belong" but the description didn't make that clear, so I updated it.
  2. Done.
  3. No reason; fixed.
  4. Nice. Fixed.
  5. Fixed.
  6. Fixed.
  7. Fixed.

@jsfenfen
Copy link

@jpmckinney The spec for the actual forms that filers submitted are detailed here http://www.fec.gov/elecfil/vendors.shtml, though it helps to know a bit about the rules for submitting them.

@jsfenfen
Copy link

@aepton @jpmckinney +1 for filer rather than committee, because in some jurisdictions folks who have to file campaign finance reports are explicitly not committees, and do not have to register as such (and there's a number of ongoing lawsuits arguing that some filers really should be committees subject to committee rules, etc.)

@jpmckinney
Copy link
Member

jpmckinney commented Dec 28, 2016

Is this spec targeting only the FEC? My understanding was the goal was broader.

Otherwise I can do one more look over and merge.

@palewire
Copy link

@jpmckinney This pull request was started by @aepton after we discussed common challenges dealing with Washington state and California campaign finance data. Our goal is for this schema to work with statehouses as well as the federal data as much as possible.

@aepton
Copy link
Contributor Author

aepton commented Dec 31, 2016

@jpmckinney Yeah, +1 to what @palewire said. I'd love it to work with any campaign finance situation, ideally - the Toronto civic data folks seemed interested, for instance.

@aepton
Copy link
Contributor Author

aepton commented Jan 20, 2017

Anything else need to be done for this, or can it be accepted?

@jpmckinney
Copy link
Member

@aepton I was going to do one more read-through - ideally this weekend.

@aepton
Copy link
Contributor Author

aepton commented Feb 1, 2017

Just pinging this :)

@jpmckinney jpmckinney merged commit 3d59a96 into opencivicdata:master Feb 3, 2017
@jpmckinney
Copy link
Member

Merging the draft 🎉

Going to follow-up in new issues/PRs.

@jpmckinney
Copy link
Member

Who are the primary contacts among the contributors to this thread for future modification of this OCDEP?

@fgregg
Copy link
Contributor

fgregg commented Feb 3, 2017

@jpmckinney I'm not sure what you are asking?

@jpmckinney
Copy link
Member

jpmckinney commented Feb 3, 2017

I just want to know whom to keep in the loop. I don't want to @ everyone in every issue/PR I open unless everyone wants me to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

10 participants